Efficient random number generation for update events in multi-bank conditional branch predictor

ABSTRACT

A branch predictor, has a plurality of memory banks having entries that hold prediction information used to predict a direction of branch instructions fetched and executed by a processor that comprises the branch predictor. A count of events that occur in the processor is provided to hardware logic that performs an arithmetic and/or logical operation, e.g., XOR, on predetermined bits of the count to generate a random value. In response to the processor determining a correct direction of a branch instruction predicted by the branch predictor, the branch predictor uses the random value generated by the hardware logic to make a decision about updating the memory banks. Bits of a branch history pattern, along with the count, may also be used to generate the random value. The event counted may be a retire of an instruction or a cycle of a core or bus clock.

CROSS REFERENCE TO RELATED APPLICATION(S)

This application claims priority to China Application No.201611013466.0, filed Nov. 17, 2016, which is hereby incorporated byreference in its entirety.

BACKGROUND

The need for increased prediction accuracy of branch instructions iswell-known if the art of processor design. The need has grown evengreater with the increase of processor pipeline lengths, cache memorylatencies, and superscalar instruction issue widths. Branch instructionprediction involves predicting the target address and, in the case of aconditional branch instruction, the direction, i.e., taken or not taken.

One popular conditional branch instruction direction predictor iscommonly referred to as a TAGE predictor, which is an acronym for TAggedGEometric length predictor, which has been described in various papersauthored by Andre Seznec. The TAGE predictor include multiple memorybanks used to store branch prediction information. Each bank of thepredictor is indexed with a hash of the program counter and a length ofthe branch history pattern except one default bank that is indexed byonly the program counter. To generate the index for each of thenon-default banks, a different length of the branch history pattern ishashed; hence Geometric length. Additionally, each entry in each bankincludes a tag that is compared with tag bits of the program counter todetermine whether a hit occurred in the bank; hence TAgged.

As the papers describe, the TAGE predictors designed by Seznec have beenentered in various branch prediction contests with significant success.The contests are based on software simulation of the branch predictors.The TAGE papers describe various ways the banks are updated in aprobabilistic fashion.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating a multi-bank conditional branchinstruction predictor.

FIG. 2 is a block diagram of an entry of a memory bank of FIG. 1.

FIG. 3 is a flowchart illustrating operation of the branch predictor ofFIG. 1.

FIG. 4 is a block diagram illustrating a multi-bank conditional branchinstruction predictor according to an alternate embodiment.

FIG. 5 is a block diagram illustrating a multi-bank conditional branchinstruction predictor according to an alternate embodiment.

FIG. 6 is a flowchart illustrating operation of a branch predictor tomake a decision about updating useful indicators.

DETAILED DESCRIPTION OF THE EMBODIMENTS

Referring now to FIG. 1, a block diagram illustrating a multi-bankconditional branch instruction predictor 100, or branch predictor 100,is shown. The branch predictor 100 is included in a processor forpredicting the direction of conditional branch instructions that may bepresent in a block of instruction bytes fetched from an instructioncache of the processor. In one embodiment, the branch predictor 100 isan improvement on a conventional TAGE predictor. However, theembodiments are not limited to a TAGE predictor, and the embodiments mayinclude other multi-bank predictors that utilize random numbers to makedecisions related to updating the banks.

The branch predictor 100 includes an instruction counter 102, a programcounter (PC) 104, a branch history pattern (BHP) 106, random numbergeneration (RNG) logic 108, hashing logic, control logic 114, comparisonlogic 116, and a plurality of pairs of muxes 122 and memory banks 124.FIG. 1 illustrates four pairs of muxes 122 and banks 124 denoted mux 0122-0 and bank 0 124-0, mux 1 122-1 and bank 1 124-1, mux 2 122-2 andbank 2 124-2, and mux N 122-N and bank N 124-N. The number of banks 124may vary in different embodiments. Each mux 122 receives two respectiveinputs 132 and 134 and generates a respective output 136. Each bank 124receives on its index input the output 136 of its respective mux 122 aswell as a respective entry update 138 from the control logic 114 andprovides a selected entry (e.g., 200 of FIG. 2) on its respective output139 to the comparison and selection logic 116. Preferably, the PC 104 isthe architectural program counter, or instruction pointer, of theprocessor that specifies an address at which a block of instructionbytes is fetched from the instruction cache.

The RNG logic 108 receives the instruction counter 102 and the branchhistory pattern 106 and performs one or more arithmetic and/or logicaloperations on selected bits of one or both of them to generate one ormore random numbers 148 provided to the control logic 114. The controllogic 114 uses the random numbers 148 to make decisions about updatingthe memory banks 124, as described in more detail herein. The RNG logic108 comprises combinatorial logic that performs the arithmetic and/orlogical operations on the selected bits of the one or both of theinstruction counter 102 and branch history pattern 106. Examples of thearithmetic and/or logical operations include, but are not limited to:selection of predetermined bits of an entity; Boolean logical operationsincluding exclusive-OR (XOR), NAND, AND, OR, NOT, rotate, shift; andarithmetic operations including addition, subtraction, multiplication,division, modulo.

The instruction counter 102 is a counter that counts instruction events.Preferably, the instruction counter 102 increments each clock cycle bythe number of architectural instructions retired by the processor duringthe clock cycle. Alternatively, the instruction counter 102 incrementseach clock cycle by the number of microinstructions retired by theprocessor during the clock cycle. Furthermore, alternate embodiments aredescribed below with respect to FIGS. 4 and 5 in which bits of a countof different events than instruction events are provided to the RNGlogic 108 for use in generating the random numbers 148. The instructioncounter 102 bits are denoted IC[x:y] in FIG. 1, where [x:y] signifies arange of bits of the instruction counter 102. In one embodiment, the RNGlogic 108 generates a random number 148 by performing a Booleanexclusive-OR (XOR) operation on bits IC[15:8] with bits IC[7:0] togenerate an 8-bit result, which is denoted RANDOM1 in FIG. 1 as shown.

The branch history pattern 106, also referred to by other terms such asthe global history register (GHR), is an N-bit shift register. As theprocessor sees a conditional branch instruction, the processor shiftsinto the shift register the direction of the conditional branchinstruction, i.e., taken or not taken, which in one embodimentcorrespond to a binary one or zero, respectively. Thus, the branchhistory pattern 106 keeps track of the direction history of the last Nconditional branches seen by the processor. In one embodiment, aconditional branch instruction is seen if it is retired; alternatively,a conditional branch instruction is seen if the processor predicts it ispresent in the block of instruction bytes fetched from the instructioncache and is at a location with the block at the current PC 104 value orafter, but not after a conditional branch predicted as taken. In oneembodiment, N is approximately 100 bits. The branch history pattern 106bits are denoted BHP[x:y] in FIG. 1, where [x:y] signifies a range ofbits of the branch history pattern 106. In one embodiment, the RNG logic108 generates a random number 148 by performing a Boolean exclusive-OR(XOR) operation on bits IC[15:8] with bits IC[7:0] with bitsBHP[msb:msb-7] to generate an 8-bit result, which is denoted RANDOM2 inFIG. 1 as shown, where msb refers to the most significant bit of thebranch history pattern 106.

Although embodiments have been described in which the random numbers 148generated by the RNG logic 108 are 8 bits, other embodiments arecontemplated in which the size of the random numbers 148 is differentand different bits of the instruction counter 102 and/or branch historypattern 106 are used. For example, in one embodiment the random numbers148 are 10 bits, e.g., RANDOM1=IC[19:10]̂IC[9:0] andRANDOM2=IC[19:10]̂IC[9:0]̂BHP[MSB:MSB-9]. It should also be understoodthat other bits of the instruction counter 102 and/or branch historypattern 106 may be used than of the embodiments described here, e.g.,RANDOM1=IC[22:13]̂IC[9:0] and RANDOM2=IC[30:21]̂IC[13:4]̂BHP[40:31].

The branch predictor 100 makes decisions about whether and how to updatethe memory banks 124 using random numbers 148 generated by the RNG logic108. Advantageously, the RNG logic 108 embodiments described hereingenerate the random numbers 148 in a very efficient manner. Furthermore,the RNG logic 108 embodiments may be more accurately simulated usingsoftware simulation tools than a simulation methodology that invokes arandom number generator provided the software simulation tools (e.g.,function random( ) in Verilog), which may enable more accurateperformance modeling of the branch predictor 100.

The hashing logic 112 hashes a portion of the program counter 104 with aportion of the branch history pattern 106 to generate a respective index132 for each of the banks 124. The respective indexes 132 are denoted132-0, 132-1, 132-2 and 132-N in FIG. 1 and are provided as a firstinput to mux 0 122-0, mux 1 122-1, mux 2 122-2 through mux N 122-N,respectively. In one embodiment, as performed by a TAGE predictor, thehashing logic 112 simply passes through the portion of the programcounter 104 as index 132-0 rather than hashing it with the branchhistory pattern 106, and the hashing logic 112 hashes a different lengthof the branch history pattern 106 with the program counter 104 togenerate each of the remaining indexes 132-1, 132-2 through 132-N. Inone embodiment, the hashing logic 112 performs an XOR of lower bits ofthe program counter 104 with the respective selected length of bits ofthe branch history pattern 106 to generate the indexes 132-1, 132-2through 132-N.

Each of the muxes 122 also receives on a second input a respectiveupdate index 134 from the control logic 114. The control logic 114controls each of the muxes 122 to select either the index 132 generatedby the hashing logic 112 or the updated index 134 generated by thecontrol logic 114 to provide on its respective output 136 to the indexinput of the respective bank 124. When the control logic 114 wants toupdate a bank 124, the control logic 114 generates a value on the bank's124 respective update index 134 to select the entry to update andcontrols the respective mux 122 to select the update index 134 andcontrols the bank 124 to write an update value 138 generated by thecontrol logic 114. When the control logic 114 wants to read an entryfrom a bank 124, the control logic 114 controls the respective mux 122to select the index 132 and in response the bank 124 provides theselected entry on its output 139 to the comparison and selection logic116.

Referring briefly to FIG. 2, a block diagram of an entry 200 of a bank124 of FIG. 1 is shown. Preferably, each entry 200 in each of the banks124 includes a valid bit 202, a tag 206, a prediction 204, and a usefulindicator 208. The valid bit 202 indicates whether or not the entry isvalid. The tag 206 is upper bits of the address (i.e., program countervalue) of the corresponding conditional branch instruction. Theprediction 204 indicates whether the conditional branch instruction willbe taken or not taken. Preferably, the entry 200 comprises a counter(e.g., 3-bit saturating counter), and the prediction 204 is the mostsignificant bit of the counter. In one embodiment, the counter isincremented when the conditional branch instruction is taken anddecremented when the not taken. In another embodiment, the counter isupdated according to a state machine based on whether the prediction 204provided by the entry was a correct prediction or a misprediction. Theuseful indicator 208 is an indication of whether or not the entry 200has been useful in predicting the conditional branch instruction. In oneembodiment, the useful indicator 208 is used by the branch predictor 100to make decisions about whether or not to allocate the entry 200 asdescribed in more detail below. In one embodiment, the useful indicator208 comprises a single bit; alternatively, the useful indicator 208comprises a multi-bit counter whose count indicates a degree ofusefulness of the entry 200.

Referring again to FIG. 1, when the branch predictor 100 is making aprediction, each bank 124 provides its respective selected entry 200 tothe comparison and selection logic 116. Preferably, the comparison andselection logic 116 selects as the final prediction 142 the predictionprovided by the entry 200 from the highest bank 124 having a valid tag206 that matches the tag portion of the program counter 104. The highestbank 124 is the bank 124 whose index 132 has the longest branch historypattern 106 length used by the hashing logic 112. In one embodiment, thetag 206 stored in the entry is the upper bits of the address of theconditional branch instruction hashed with the branch history pattern106, and the tag portion of the program counter 104 is hashed with thebranch history pattern 106, and the two are compared by the comparisonand selection logic 116. The comparison and selection logic 116 providesan indication to the control logic 114 of which of the banks 124 wasselected as the final prediction 142.

The control logic 114 also receives, from an execution unit of theprocessor that executes conditional branch instructions, information 144regarding each executed conditional branch instruction, such as thecorrect direction of the conditional branch instruction and its address.The control logic 114 maintains information about each predictedconditional branch instruction until it determines that the conditionalbranch instruction was executed or flushed from the processor pipeline.The control logic 114 uses the information maintained about eachpredicted conditional branch instruction and the information 144received from the execution unit to enable it to make decisions aboutupdating the memory banks 124 using the random numbers 148 generatedusing the instruction counter 102 and/or branch history pattern 106, asdescribed in more detail below. In one embodiment, the processorincludes a branch order table (BOT) that stores relevant information(including addresses) about in flight branch instructions and operatessimilarly to a reorder buffer (ROB).

Preferably, the processor that includes the branch predictor 100includes a fetch unit, an instruction cache, a branch target addresscache, an instruction translator, and an execution pipeline. In oneembodiment, the execution pipeline is a superscalar out-of-orderexecution pipeline that includes one or more architectural registerfiles, a register renaming unit, a reorder buffer, reservation stations,a plurality of execution units, and an instruction scheduler forscheduling the issue of microinstructions to the execution units. Theexecution units may include one or more of the following execution unittypes: integer unit, floating-point unit, media unit,single-instruction-multiple-data (SIMD) unit, branch execution unit,load unit, and store unit. Preferably, the processor also includes amemory subsystem that includes a memory order buffer,translation-lookaside buffers, a tablewalk engine, a cache memoryhierarchy, and various request queues, e.g., one or more load queues,store queues, fill queues, and/or snoop queues. Preferably, the fetchunit generates a block address, based on the program counter value,which is provided to the instruction cache and the branch target addresscache. The instruction cache provides a block of architecturalinstruction bytes in response to the block address that may include oneor more architectural branch instructions. Preferably, the instructionbyte block is received by an instruction translator that translates thearchitectural instructions into microinstructions that are provided tothe execution pipeline for execution.

Referring now to FIG. 3, a flowchart illustrating operation of thebranch predictor 100 of FIG. 1 is shown. Flow begins at block 302.

At block 302, a block of instruction bytes is fetched from theinstruction cache of the processor that is predicted to include at leastone conditional branch instruction. Preferably, the branch targetaddress cache predicts the presence of the conditional branchinstruction by looking up the value of the program counter 104 inFIG. 1. Additionally, the hashing logic 112 hashes the program counter104 value with various lengths of the branch history pattern 106 togenerate the indexes 132 to apply to the banks 124 of FIG. 1. Theselected entries 139 are provided to the comparison and selection logic116 which selects a final prediction 142 that is provided to theexecution pipeline of the processor. In particular, the branch predictor100 selects as the final prediction 142 the prediction 204 from theentry of one of the banks 124. The bank 124 whose entry is selected isreferred to in FIG. 3 as bank X. As described above, preferably thecomparison and selection logic 116 selects the entry of the highest bank124 having a valid tag that matches the tag portion of the programcounter 104, where the highest bank 124 is the bank 124 whose index 132has the longest branch history pattern 106 length used by the hashinglogic 112. Preferably, the final prediction 142 is also provided to theexecution pipeline of the processor so that the execution unit thatexecutes the conditional branch instruction can compare the predictionto the resolved correct direction of the conditional branch instructionand provide an information 144 to the branch predictor 100 about whetheror not the prediction 142 was correct. The branch predictor 100 uses theinformation 144 to update the banks 124. Flow proceeds to block 304.

At block 304, the execution unit executes the conditional branchinstruction to resolve its correct direction, i.e., taken or not taken,and provides the branch predictor 100 with the correct direction 144.Flow proceeds to block 306.

At block 306, the control logic 114 determines that it needs to updateone or more of the banks 124, so the RNG logic 108 generates randomnumbers 148 of FIG. 1 using instruction counter 102. As described above,the RNG logic 108 may also generate the random numbers 148 using boththe instruction counter 102 and the branch history pattern 106. Flowproceeds to decision block 308.

At decision block 308, the control logic 114 determines whether or notthe direction predicted by the branch predictor 100 matches the correctprediction 144 provided by the execution unit at block 304. If so, flowproceeds to decision block 312; otherwise, flow proceeds to decisionblock 316.

At decision block 312, the control logic 114 examines the random number148 generated by the RNG logic 108. If the random number 148 is in therange of values 8-255, flow proceeds to block 314; whereas, if therandom number 148 is in the range 0-7, flow proceeds to decision block316. In this manner, the control logic 114 effectively decides whetheror not to allocate a new entry according to a ratio of 1:31 when thepredicted direction mismatches the correct direction. Advantageously,the RNG logic 108 of FIG. 1 provides the random number 148 in anefficient fashion and by comprising easily simulated combinatorial logicprovides a more accurate ability to measure the performance gainsafforded by the quality of the random numbers 148 provided by the RNGlogic 108 rather than by the quality of the random numbers that would beprovided by a random number generation function of the simulation tools,as described above.

At block 314, the branch predictor 100 does not allocate a new entry inthe banks 124, and flow ends at block 314.

At decision block 316, the control logic 114 examines the random number148 generated by the RNG logic 108. If the random number 148 is in therange of values 0-63, flow proceeds to block 318; whereas, if the randomnumber 148 is in the range 64-255, flow proceeds to block 322. In thismanner, the control logic 114 effectively decides whether to startlooking for a bank 124 from which to allocate at bank X+1 or bank X+2according to a ratio of 3:1. In one embodiment, the random number 148examined at decision block 316 is a second random number 148, i.e.,different than the random number 148 examined at decision block 308. Itshould be understood that the ratios used by the branch predictor 100based on the random numbers 148, e.g., at decision blocks 312, 316 and324, are described as examples, and other embodiments are contemplatedthat use other ratios. Additionally, it should be understood thatalthough embodiments are described in which the random numbers 148generated and used are 8 bits, other embodiments are contemplated inwhich different size random numbers 148 are generated and used.

At block 318, the control logic 114 starts at bank X+2 to find the firsttwo banks 124 whose useful indicator 208 has a value of zero. Forexample, if bank X is bank 1 124-1 (i.e., the bank 124 that made theprediction as determined at block 302), then bank X+2 is bank 3 124-3.The branch predictor 100 may not be able to find two banks 124 that havezero useful indicators 208, or even one bank 124 that has a zero-valueduseful indicator 208. Furthermore, the branch predictor 100 may needonly one bank 124 that has a zero-valued useful indicator 208, e.g., ifflow proceeds to block 328. Flow proceeds to decision block 324.

At block 322, the control logic 114 starts at bank X+1 to find the firsttwo banks 124 whose useful indicators 208 have a value of zero. Forexample, if bank X is bank 1 124-1 (i.e., the bank 124 that made theprediction as determined at block 302), then bank X+1 is bank 2 124-2.Flow proceeds to decision block 324.

At decision block 324, the control logic 114 examines the random number148 generated by the RNG logic 108. If the random number 148 is in therange of values 0-15, flow proceeds to block 326; whereas, if the randomnumber 148 is in the range 16-255, flow proceeds to block 328. In thismanner, the control logic 114 effectively decides whether to in one bank124 or in two banks 124 according to a ratio of 15:1. In one embodiment,the random number 148 examined at decision block 324 is a third randomnumber 148, i.e., different than the random numbers 148 examined atdecision blocks 308 and 316.

At block 326, the control logic 114 allocates a new entry for theconditional branch instruction in both of the two banks 124 found atblock 318/322. Flow ends at block 326.

At block 328, the control logic 114 allocates a new entry for theconditional branch instruction in only the shortest (i.e., using theshorter branch predictor 100 length) of the two banks 124 found at block318/322 to bank X. Flow ends at block 328.

Referring now to FIG. 4, a block diagram illustrating a multi-bankconditional branch instruction predictor 100 according to an alternateembodiment is shown. The branch predictor 100 of FIG. 4 is similar inmany respects to the branch predictor 100 of FIG. 1 and like-numberedelements are similar. However, the branch predictor 100 of FIG. 4replaces the instruction counter 102 of FIG. 1 with a core clock cyclecounter (CCCC) 402. The CCCC 402 counts the number of clock cycles of aprocessing core of the processor, preferably since reset of the core.The CCCC 402 is provided to the RNG logic 108 which uses it to generatethe random numbers 148 provided to the control logic 114 which uses themto make decisions about updating the memory banks 124. Accordingly, theRANDOM1 and RANDOM2 equations shown in FIG. 4 are updated asRANDOM1=CCCC[19:10]̂CCCC[9:0] andRANDOM2=CCCC[19:10]̂CCCC[9:0]̂BHP[MSB:MSB-9] to illustrate the use of theCCCC 402 rather than the instruction counter 102. The operation of thebranch predictor 100 of FIG. 4 is similar to that described with respectto FIG. 3; however, at block 306 the RNG logic 108 uses the CCCC 402rather than the instruction counter 102 bits to generate the randomnumbers 148.

Referring now to FIG. 5, a block diagram illustrating a multi-bankconditional branch instruction predictor 100 according to an alternateembodiment is shown. The branch predictor 100 of FIG. 5 is similar inmany respects to the branch predictor 100 of FIG. 1 and like-numberedelements are similar. However, the branch predictor 100 of FIG. 5replaces the instruction counter 102 of FIG. 1 with a bus clock cyclecounter (BCCC) 502. The BCCC 502 counts the number of clock cycles of abus external to the processor, preferably since reset of the processor.For example, the bus clock may be a system bus that couples theprocessor with peripherals and/or memory of the system. The BCCC 502 isprovided to the RNG logic 108 which uses it to generate the randomnumbers 148 provided to the control logic 114 which uses them to makedecisions about updating the memory banks 124. Accordingly, the RANDOM1and RANDOM2 equations shown in FIG. 5 are updated asRANDOM1=BCCC[19:10]̂BCCC[9:0] andRANDOM2=BCCC[19:10]̂BCCC[9:0]̂BHP[MSB:MSB-9] to illustrate the use of theBCCC 502 rather than the instruction counter 102. The operation of thebranch predictor 100 of FIG. 5 is similar to that described with respectto FIG. 3; however, at block 306 the RNG logic 108 uses the BCCC 502rather than the instruction counter 102 bits to generate the randomnumbers 148.

Referring now to FIG. 6, a flowchart illustrating operation of thebranch predictor 100 of FIG. 1 to make a decision about updating usefulindicators 208 is shown. In one embodiment, each time the branchpredictor 100 selects a bank 124 to make a prediction 142 (denoted bankX in FIG. 3), the branch predictor 100 also remembers an alternateprediction bank 124, referred to herein as bank Y. Bank Y is the bank124 that would have been used to make the prediction 142 if there hadbeen a miss in bank X. That is, bank Y is the next-lowest bank 124 inwhich the tag portion of the program counter 104 also hit, and if therewas no lower bank 124 in which there was a hit then bank Y is thedefault bank 124, e.g., bank 0. Preferably, whenever bank X correctlypredicts the direction of a conditional branch instruction and bank Yincorrectly predicts, then the control logic 114 increments the usefulindicator 208. As described with respect to FIG. 3, the useful indictor208 is used to allocate entries in the banks 124. However, if the usefulindictors 208 are only ever incremented and never decremented, theneventually there will be no zero-valued useful indictors 208, which willnegatively impact the allocation scheme. So, as described in the TAGEpapers of Seznec, there is a need to age the useful indictors 208 toreset them to zero. Section 2.2 of the paper published in May 20, 2011entitled A 64Kbytes ISL-TAGE branch predictor by Andre Seznec describesthe use of an 8-bit counter, referred to as TICK, used to dynamicallymonitor the number of successes and failures when trying to allocate anew entry after a misprediction. The TICK counter saturates when morefailures than successes are encountered on allocations and at that timeSeznec's predictor resets all the useful bits of the predictor. FIG. 6describes a variation on the Seznec scheme. Preferably, the controllogic 114 of the branch predictor 100 of FIG. 1 also includes a counter(not shown) referred to herein as T. Use of T in conjunction with therandom numbers 148 generated by the RNG logic 108 based on theinstruction counter 102 (or CCCC 402 or BCCC 502) and/or the branchhistory pattern 106 is described with respect to FIG. 6. Flow begins atblock 602.

At block 602, in conjunction with the operations performed at block 318or block 322, the control logic 114 determines two values, N and P. N isthe number of useful indictors 208 from the banks 124 looked-up (i.e.,searched) at block 318/322 that have a zero-valued useful indictor 208,and P is the number of non-zero-valued useful indictors 208 from thelooked-up banks 124. Flow proceeds to block 604.

At block 604, the control logic 114 increments the value of T by P anddecrements the value of T by N. In one embodiment, T is a 10-bit counterand thus has a range of 0-1023 and that is initialized to zero uponreset of the processor. Flow proceeds to decision block 606.

At decision block 606, the control logic 114 determines whether thevalue of T is greater than or equal to the value of a random number 148.If not, flow ends; otherwise, flow proceeds to decision block 608. Inone embodiment, the random number 148 compared at decision block 606 isa 7-bit random number 148 and thus has a range of 0-127.

At decision block 608, the control logic 114 examines a random number148 generated by the RNG logic 108. If the random number 148 is in therange of values 0-127, flow ends; whereas, if the random number 148 isin the range 128-255, flow proceeds to block 612. In this manner, thecontrol logic 114 effectively decides whether to decrement the usefulindictors 208 according to a 1:1 ratio. In one embodiment, the randomnumber 148 examined at decision block 608 is a second random number 148,i.e., different than the random number 148 examined at decision block606. It should be understood that the ratio used by the branch predictor100 based on the random numbers 148, e.g., at decision block 608, isdescribed as an example, and other embodiments are contemplated that useother ratios. Additionally, it should be understood that althoughembodiments are described in which the random numbers 148 generated andused at blocks 606 and 608 are 7 bits and 8 bits, respectively, otherembodiments are contemplated in which different size random numbers 148are generated and used.

While various embodiments of the present invention have been describedherein, it should be understood that they have been presented by way ofexample, and not limitation. It will be apparent to persons skilled inthe relevant computer arts that various changes in form and detail canbe made therein without departing from the scope of the invention. Forexample, software can enable, for example, the function, fabrication,modeling, simulation, description and/or testing of the apparatus andmethods described herein. This can be accomplished through the use ofgeneral programming languages (e.g., C, C++), hardware descriptionlanguages (HDL) including Verilog HDL, VHDL, and so on, or otheravailable programs. Such software can be disposed in any known computerusable medium such as magnetic tape, semiconductor, magnetic disk, oroptical disc (e.g., CD-ROM, DVD-ROM, etc.), a network, wire line orother communications medium. Embodiments of the apparatus and methoddescribed herein may be included in a semiconductor intellectualproperty core, such as a processor core (e.g., embodied, or specified,in a HDL) and transformed to hardware in the production of integratedcircuits. Additionally, the apparatus and methods described herein maybe embodied as a combination of hardware and software. Thus, the presentinvention should not be limited by any of the exemplary embodimentsdescribed herein, but should be defined only in accordance with thefollowing claims and their equivalents. Specifically, the presentinvention may be implemented within a processor device that may be usedin a general-purpose computer. Finally, those skilled in the art shouldappreciate that they can readily use the disclosed conception andspecific embodiments as a basis for designing or modifying otherstructures for carrying out the same purposes of the present inventionwithout departing from the scope of the invention as defined by theappended claims.

1. A branch predictor, comprising: a plurality of memory banks havingentries that hold prediction information used to predict a direction ofbranch instructions fetched and executed by a processor that comprisesthe branch predictor; a count of events that occur in the processor;hardware logic that performs an arithmetic and/or logical operation onpredetermined bits of the count to generate a random value; and whereinin response to the processor determining a correct direction of a branchinstruction predicted by the branch predictor, the branch predictor usesthe random value generated by the hardware logic to make a decisionabout updating the memory banks.
 2. The branch predictor of claim 1,further comprising: a branch history pattern that specifies a history ofdirections of branch instructions encountered by the processor; andwherein the hardware logic, in addition to performing the arithmeticand/or logical operation on predetermined bits of the count, performsthe arithmetic and/or logical operation also on predetermined bits ofthe branch history pattern to generate the random value.
 3. The branchpredictor of claim 1, further comprising: wherein the arithmetic and/orlogical operation comprises a Boolean exclusive-OR (XOR) operation of afirst portion of bits of the count with a second portion of bits of thecount.
 4. The branch predictor of claim 1, further comprising: whereinthe event counted comprises a retire of an instruction by the processor.5. The branch predictor of claim 1, further comprising: wherein theevent counted comprises a cycle of a clock of an external bus to whichthe processor is coupled.
 6. The branch predictor of claim 1, furthercomprising: wherein the event counted comprises a cycle of a core clockof the processor.
 7. The branch predictor of claim 1, furthercomprising: wherein the random value is from a set of possible values;wherein when the direction predicted by the branch predictor matches thecorrect direction, the branch predictor allocates a new entry in one ormore of the plurality of memory banks if the random value is one of apredetermined subset of the set of possible values and otherwise doesnot allocate a new entry in one or more of the plurality of memorybanks.
 8. The branch predictor of claim 1, further comprising: whereinthe random value is from a set of possible values; wherein when thedirection predicted by the branch predictor does not match the correctdirection, the branch predictor allocates a new entry in more than oneof the plurality of memory banks if the random value is one of apredetermined subset of the set of possible values and otherwiseallocates a new entry in one of the plurality of memory banks.
 9. Thebranch predictor of claim 1, further comprising: wherein the randomvalue is from a set of possible values; wherein each bank of theplurality of memory banks receives an index computed using a differentlength of a branch history pattern, and each bank of the plurality ofmemory banks has a number, and the bank numbers increase sequentiallyfrom shortest length to longest length; and wherein each entry in theplurality of memory banks includes an indicator that indicates whetherthe entry has tended to be useful in predicting the direction of branchinstructions; wherein when allocating an entry in the plurality of banksin response to the processor determining the correct direction of abranch instruction whose direction was predicted by a first bank of theplurality of memory banks having a bank number X, the branch predictorbegins searching for a non-useful entry to allocate at bank numbered X+1if the random value is one of a predetermined subset of the set ofpossible values and otherwise begins searching for a non-useful entry toallocate at bank numbered X+2.
 10. The branch predictor of claim 1,further comprising: wherein each entry in the plurality of memory banksincludes an indicator that indicates whether the entry has tended to beuseful in predicting the direction of branch instructions; a counterincremented by a number of banks whose indicator indicates its entry hastended to be useful and decremented by a number of banks whose indicatorindicates its entry has tended not to be useful; and wherein the branchpredictor probabilistically decrements the indicators when the counterhas a value greater than or equal to the random value.
 11. A method ofoperating a branch predictor that has a plurality of memory banks havingentries that hold prediction information used to predict a direction ofbranch instructions fetched and executed by a processor that comprisesthe branch predictor, the method comprising: maintaining a count ofevents that occur in the processor; performing an arithmetic and/orlogical operation on predetermined bits of the count to generate arandom value; and in response to the processor determining a correctdirection of a branch instruction predicted by the branch predictor,using the generated random value to make a decision about updating thememory banks.
 12. The method of claim 11, further comprising:maintaining a branch history pattern that specifies a history ofdirections of branch instructions encountered by the processor; and inaddition to performing the arithmetic and/or logical operation onpredetermined bits of the count, performing the arithmetic and/orlogical operation also on predetermined bits of the branch historypattern to generate the random value.
 13. The method of claim 11,further comprising: wherein the arithmetic and/or logical operationcomprises a Boolean exclusive-OR (XOR) operation of a first portion ofbits of the count with a second portion of bits of the count.
 14. Themethod of claim 11, further comprising: wherein the event countedcomprises a retire of an instruction by the processor.
 15. The method ofclaim 11, further comprising: wherein the event counted comprises acycle of a clock of an external bus to which the processor is coupled.16. The method of claim 11, further comprising: wherein the eventcounted comprises a cycle of a core clock of the processor.
 17. Themethod of claim 11, further comprising: wherein the random value is froma set of possible values; when the direction predicted by the branchpredictor matches the correct direction, allocating a new entry in oneor more of the plurality of memory banks if the random value is one of apredetermined subset of the set of possible values and otherwise notallocating a new entry in one or more of the plurality of memory banks.18. The method of claim 11, further comprising: wherein the random valueis from a set of possible values; when the direction predicted by thebranch predictor does not match the correct direction, allocating a newentry in more than one of the plurality of memory banks if the randomvalue is one of a predetermined subset of the set of possible values andotherwise allocating a new entry in one of the plurality of memorybanks.
 19. The method of claim 11, further comprising: wherein therandom value is from a set of possible values; wherein each bank of theplurality of memory banks receives an index computed using a differentlength of a branch history pattern, and each bank of the plurality ofmemory banks has a number, and the bank numbers increase sequentiallyfrom shortest length to longest length; and wherein each entry in theplurality of memory banks includes an indicator that indicates whetherthe entry has tended to be useful in predicting the direction of branchinstructions; when allocating an entry in the plurality of banks inresponse to the processor determining the correct direction of a branchinstruction whose direction was predicted by a first bank of theplurality of memory banks having a bank number X, beginning to searchfor a non-useful entry to allocate at bank numbered X+1 if the randomvalue is one of a predetermined subset of the set of possible values andotherwise beginning to search for a non-useful entry to allocate at banknumbered X+2.
 20. The method of claim 11, further comprising: whereineach entry in the plurality of memory banks includes an indicator thatindicates whether the entry has tended to be useful in predicting thedirection of branch instructions; incrementing a counter by a number ofbanks whose indicator indicates its entry has tended to be useful anddecrementing the counter by a number of banks whose indicator indicatesits entry has tended not to be useful; and probabilisticallydecrementing the indicators when the counter has a value greater than orequal to the random value.
 21. A computer program product encoded in atleast one non-transitory computer usable medium for use with a computingdevice, the computer program product comprising: computer usable programcode embodied in said medium, for specifying a branch predictor, thecomputer usable program code comprising: first program code forspecifying a plurality of memory banks having entries that holdprediction information used to predict a direction of branchinstructions fetched and executed by a processor that comprises thebranch predictor; second program code for specifying a count of eventsthat occur in the processor; third program code for specifying hardwarelogic that performs an arithmetic and/or logical operation onpredetermined bits of the count to generate a random value; and whereinin response to the processor determining a correct direction of a branchinstruction predicted by the branch predictor, the branch predictor usesthe random value generated by the hardware logic to make a decisionabout updating the memory banks.